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Summary 

The public telephone network has been found convenient for making speech 
contributions to programmes. However, using normal handsets the quality achieved is 
often poor by broadcasting standards. To enable an improved quality to be obtained 
from reporters' contributions, more sophisticated equipment might be used to replace 
the normal telephone handset. 

The report discusses methods of sending digital information over the public 
telephone network and reviews techniques for digitally coding speech at low bit-rates. It 
concludes that a good quality speech channel cannot be derived digitally for the present 
system, using currently available devices, but new advances and development of the Post 
Office digital network might enable this to be done in some years time. 
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THE POSSIBILITY OF SENDING PROGRAMME SPEECH CONTRIBUTIONS 

DIGITALLY OVER THE PUBLIC TELEPHONE NETWORK 
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1. Introduction 

It is often required to make a speech contribution for 
broadcasting from a remote position either in this country 
or abroad. Such contributions are used frequently in news 
programmes and in programmes where listeners are asked 
to contribute directly to a particular programme. 

The public telephone network has been found con- 
venient for making such contributions, using the readily 
available telephone facilities to contact the studio and a 
standard telephone handset to make the contribution. At 
the studio, special equipment can be used to process the 
incoming signal and to provide a means of communicating 
with the contributor. 

The quality of speech contributions made in this way 
is poor by broadcasting standards and is limited by the 
performance of the handset and the communication 
channel. Investigations have shown that only a small 
improvement in quality can be obtained by processing the 
incoming signal. 

A possibly more rewarding approach would be to 
devise equipment which could be used to replace the tele- 
phone handset and to generate a signal which would be less 
susceptible to the impairments of the communication 
channel. In practice the application of such equipment 
would be limited to those positions where it is convenient 
to make the necessary electrical connections from the new 
equipment to the telephone circuit and where approval 
from the appropriate Telecommunications Administration 
or PTT can be granted for this. Hence such equipment 
might be usable for reporters making contributions from 
both within and outside this country, but its use would be 
ruled out where listeners contribute directly to a pro- 
gramme. Moreover, if such a speech coding system could 
be devised, it might enable commentaries to be transmitted 
via the public telephone network. This would be more 
economic and convenient than using the commentary 
circuits available at present 

In this report solutions involving the use of digits and 
digital speech transmission are considered. Methods of 
deriving digital circuits from the public telephone network 
are described and methods of digitally coding speech at low 
bit-rates are reviewed. 



2. Digital capacity of voice-grade telephone cir- 
cuits* 

2.1. General 

The public telephone network is primarily intended 

* The work reported in this section was carried out by R.V. Harvey 
and M.J. Kaliaway. 



for transmitting speech signals and has parameters which 
exploit certain insensitivities in the human ear. For 
instance, large phase distortions can be tolerated when 
speech signals are being transmitted at baseband. However, 
if speech is being transmitted in digital form phase distor- 
tions are important because they cause interference between 
successive digits. Therefore, to achieve the maximum rate 
of digit transmission some form of correction for both 
amplitude and phase distortion must be used. 

To obtain a digital circuit from a voice-grade tele- 
phone circuit it is necessary to modulate the digital signal 
so that it occupies the range of frequencies where the 
impairments are a minimum. The bandwidth of most 
telephone circuits is at most 3-1 kHz (from 300 Hz to 3*4 
kHz). The distortions are minimum at the centre and 
increase towards the edges of this band. The typical 
bandwidth usable for digit transmission is about 2-4 kHz. 
It therefore follows that the higher the bit-rate to be trans- 
mitted, the wider the bandwidth that will have to be used, 
and the degree of correction of the impairments must then 
be higher. A device that modulates the digital signals 
considered in combination with a complementary device 
that demodulates them at the receiving terminal — possibly 
also providing some degree of correction for the impair- 
ments of the circuit — is called a modem. 

2.2. Currently available modems 

Modems currently available from the Post Office in 
the UK give a range of bit-rates up to 2-4 kb/s and there 
are CCITT recommendations for the designs of modems 
operating up to this bit-rate. Commercial modems are 
available giving bit-rates up to 9-6 kb/s over suitable 
circuits. 

For operation at bit-rates higher than about 1-2 kb/s, 
digital channels cannot reliably be derived from all public 
switched telephone network circuits. Some circuits have 
sufficient impairments to cause a large number of errors to 
be generated in the received signal. To achieve a reliable 
connection at the higher rates it is necessary to resort to 
renting private circuits which have an adequate specified 
performance. 

2.3. Theoretical maximum bit-rate on available voice- 
grade telephone circuits 

In a recent study* the information available describing 

6 7 8 

the impairments of voice-grade telephone circuits ' ' was 
examined, together with the types of modulation systems 
which could be used. It was assumed that bit error proba- 
bilities up to 10 -5 could be tolerated for low bit-rate 
speech transmission. The degree of equalisation of the 
amplitude and phase characteristics required for the dif- 
ferent modulation systems was also assessed. 

* Unpublished work by M.J. Kaliaway. 
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This study concluded that, using the public switched 
telephone network and no equalisation, the highest usable 
bit-rate would be about 1-2 kb/s. With equalisation this 
figure can be increased to 2-4 kb/s for most lines. With a 
privately-rented circuit the figure for no equalisation could 
be 2-4 kb/s. With various forms of simple equaliser the 
bit-rate could be increased to about 4-8 kb/s and with a 
complex adaptive equaliser it seems that about 9-6 kb/s is 
the maximum that could be achieved for a private line. 

Various quotations and estimates for the cost of these 
systems (2 terminals) have been obtained and used to give 
the curve shown in Fig. 1. An important factor in inter- 
preting the information shown in Fig. 1, is that, for our 
application, we are only considering contribution, not two- 
way traffic, and that the cost of the transmitting part of the 
equipment is less than half the total cost of the equipment. 
The receiving equipment, which would be at the studio 
centre, would include the equalisers. For a 9 - 6 kb/s modem 
the cost of the modulator would be about 1/5th of the cost 
of the demodulator. So it might be possible to justify use 
of the more costly systems for our particular application 
where there could be a large number of transmitters but 
only a few receiving terminals. 



3. Transmission at higher bit-rates 

3.1. General 

Using voice grade circuits and special modems it was 
reported in Section 2 that the absolute maximum bit-rate 
that could be transmitted would be about 10 kb/s. How- 
ever, the Post Office does at present offer some digital 
circuits in the UK, and it is likely that digit-transmission 



facilities will be widespread in the future. These circuits 
might therefore also be considered by the BBC for making 
speech contributions to programmes. 

3.2 Datel 48 K 

One of the higher transmission bit-rate circuits at 
present being offered by the BPO is the Datel 48K, 48 kb/s 
circuit. In this system, a special leased facility, an entire 12 
channel telephone group band of 48 kHz is made over to 
the transmission of digits. 9 Although this facility is avail- 
able in some areas, it is expensive because of the number 
of telephone channels displaced. Also, use of this system 
might in the future be less convenient than using a part of 
the Post Office integrated digital network when this becomes 
widely available. 

3.3. Possible 64 kb/s circuits 

The British Post Office, in common with other 
European telephone authorities, is continuing to develop 
digital circuits for transmission of telephone speech sig- 
nals. 1 ° Present proposals are for 30 telephone channels to 
be derived from a first order multiplex rate of 2048 kb/s. 
Here each telephone channel would be digitally coded pro- 
ducing, in effect, a bit-rate of 64 kb/s for each channel. 

In the nationwide proposed network it is not yet 
clear whether convenient access to 64 kb/s digital channels 
will be offered. It has been suggested 1 1 that developments 
in m.o.s. circuitry might make it economic to provide each 
subscriber with a digital encoder as part of the normal 
telephone equipment. There could then be rapid develop- 
ment towards a situation where all telephone connections 
would be of a digital nature and access to 64 kb/s channels 
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might be obtained at almost any telephone terminal. 
Alternatively, if digital encoders must be sited at exchanges 
it might still be possible to obtain access to the digital 
channels, although here special local connections would be 
required and special facilities for handling the digital signals 
at exchanges might be necessary. Either way, transmitting 
data or digitally coded speech contributions (for which a 
higher error rate than for computer terminals is acceptable) 
within the planned integrated digital network would be 
more convenient, and could well be more economic, than 
using the Datel 48K service. Use of such a 64 kb/s service 
might also enable several channels to be grouped together 
to give an even higher transmission rate if required. 

3.4. Other developments by the BPO for providing 
digital circuits 

Recent work at the Post Office Research Depart- 
ment has indicated that a radically different approach to 
local distribution might enable aconvenient access to digital 
communication circuits at a variety of bit-rates. In this 
proposal the use of high bit-rate data highway rings is 
advocated. These rings would be equipped with digital 
multiplexers and concentrators at convenient intervals to 
allow access from all premises to a wide range of circuits at 
bit-rates up to those required for visual telephone connec- 
tions. 

If this development were to take place it would 
change radically communications on a local basis and over- 
come many of the problems of speech contributions dis- 
cussed in this report However, such systems would not be 
available on a sufficiently widespread basis for a decade or 
more because of the large capital investment represented 
by the existing telephone system. Therefore, while it is 
interesting to note that such a development might eventually 
solve local-access problems, other solutions that might be 
available in the more immediate future should be sought. 



4. Summary of possible digit-transmission rates for 
digital speech contributions 

For our application of digital speech contributions 
made via the public telephone network the availability of 
circuits would be extremely important. From this point of 
view the most readily available circuit is a dial-up telephone 
circuit. It was stated in Section 2 that the highest bit-rate 
that can be derived reliably from such a circuit would be 
only 1-2 kb/s. For all other facilities discussed in Sections 
2 and 3, special arrangements would have to be made with 
the Post Office. Of these, the most useful facility for 
speech contribution, when it is available, will be the 64 kb/s 
facility. Meanwhile it might be possible to gain advantage, 
in some circumstances, using a private circuit equipped with 
digital modems to give 9-6 kb/s. 



been considered for coding speech at sufficiently low bit- 
rates to take advantage of either 1-2 kb/s, 9-6 kb/s or 64 
kb/s. digital circuits. Some of the devices have already been 
developed and are available either as a design or a com- 
mercial device. Others are still being developed and might, 
in the foreseeable future, be readily available. 

In some of these devices emphasis has been placed 
principally on reducing the bandwidth of the signals to be 
transmitted whether in analogue or digital form. In these 
cases it is necessary to consider the bit-rate necessary to 
convey the various low-frequency control signals. 

The various types of speech coding devices can be 
broadly categorised as either vocoders or waveform coding 
devices. In a vocoder the speech is analysed to derive a 
number of parameters which are transmitted and used, at 
the receiving terminal synthesiser, to reproduce sounds 
which are subjectively similar to the original speech. In a 
waveform coding device attempts are made to reproduce 
the original speech waveform on a point by point basis as in 
a pulse code modulation (p.c.m.) system. 

The subjective impairments produced by the two types 
of speech coding devices when operating at low bit-rates 
differ considerably. A vocoder produces speech-like 

sounds which are, for the main part, clear and intelligible 
and not accompanied by any undue electronic noise or 
non-linear distortion. When a vocoder impairs the signal 
the output is less like human speech and all types of 
speech sound may not be clearly identifiable. Examples of 
this would be that the speech might sound as if the speaker 
were talking through a metal pipe; alternatively the speech 
might sound as if it had been segmented and some artificial 
tones inserted in place of some of the speech segments. 

Waveform coding devices cause impairments which are 
more readily identified as the introduction of noise or forms 
of non-linear distortion. In some cases a steady background 
noise is audible, in others noise which changes in character 
is the applied signal changes can occur. Various forms of 
harmonic and intermodulation distortion may also be 
present; intermodulation products may occur involving 
discrete frequencies, as when a p.c.m. signal is sampled at 
too low a frequency. 

In cost and complexity the two categories of coder 
vary considerably. Vocoders are, in general, very sophisti- 
cated and costly, whereas most waveform coding devices 
are relatively simple to implement. 

In the following sections of this report a number of 
speech coding devices are described. For each device some 
indication of performance, cost and complexity will also be 
included. 

5.2. Vocoders 



5. Digital speech coding systems 



5.2.1. Formant tracking vocoders 



5.1. Genera! 



In this investigation many speech coding devices have 



Of the different speech coding devices at present 
being investigated the formant tracking vocoder is the most 
fundamental and requires a complete understanding of the 
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Fig. 2- A series- form ant speech synthesiser 



natural speech production process, in this type of vocoder 
speech is analysed to derive the values of a number of para- 
meters which describe exactly what the component parts of 
the human vocal system did to produce the speech sounds. 
At the receiving terminal a synthesiser is used to simulate 
electronically the action of the vocal system. The analysis 
and synthesis techniques are described fully in References 
13, 14 and 15. Here, for brevity, we shall describe an 
electronic speech synthesiser only, and simply define a 
speech analyser as a collection of devices which derives the 
values of speech parameters separately to drive the syn- 
thesiser. 

A schematic for a series formant speech synthesiser is 
shown in Fig. 2. Sounds produced by the larynx are 
simulated using an electronic pulse generator whose fre- 
quency is controlled by parameter F Q . The output is 
varied in amplitude (A Q j and applied to variable frequency 
formant generating filters (F 3 , F 2 and F s ) and a fixed 
frequency filter. These filters simulate the effects of the 
vocal tracts. Fricative sounds, i.e. sibilants and the main 
part of most consonants, are simulated using an electronic 
noise generator whose output is modulated in amplitude 
(A hl ) and applied to a variable frequency filter (F h ) to 
simulate the effects of the vocal tracts. The other paths 
allow a controlled amount of nasalation (A n ) and 'hiss 
through formants' (A h2 ). 



Much work has been carried out developing formant 
tracking vocoders similar to that described. It has been 
found by some workers that a more complex arrange- 
ment, where the formant paths are in parallel rather than in 
series, allows more accurate speech synthesis giving more 
natural speech sounds. However, investigations are con- 
tinuing, mainly using computer simulations, to devise the 
best form of speech synthesiser and to perfect analysing 
techniques. At present it is not possible to estimate how 
high a quality will be achieved with such systems, although, 
in principle, it should be possible to achieve extremely high 
quality wideband speech. 



In a low bit-rate speech coding system using a formant 
tracking vocoder, each parameter would be digitised before 
being transmitted to the synthesiser. Work on digitising 
these parameters 16 has shown that the digit transmission 
rate might be about T5 kb/s although this depends on the 
number of parameters being transmitted. 

In some years time formant tracking vocoders might 
be available in a form convenient for our application. 
However, one disadvantage with these devices would be 
that the analyser would probably be much more complex 
than the synthesiser and it might therefore not be practical 
to equip reporters with these. 

5.2.2. Waveform-prediction vocoders 

A waveform-prediction vocoder is similar to a 
formant tracking vocoder in that the synthesiser is of the 
same general form differing only in the implementation of 
the filters which simulate the vocal tracts. The main dif- 
ference between the two is that the analyser, rather than 
being a collection of devices deriving the values of the 
different parameters separately, is a digital filter whose 
coefficients are adapted to give the best prediction of the 
speech waveform. The values of these coefficients can be 
processed to generate similar speech parameters to those 
used in the formant tracking vocoder and these can be trans- 
mitted in a similar way. This particular implementation or 
a vocoder simplifies the analyser and is particularly suited 
to digital techniques. 

The schematic for a waveform-prediction vocoder de- 
vised by Atal and Hanauer is shown in Fig. 3. At the 
coder, filter coefficients which give the best prediction of 
the speech waveform are calculated and coded for trans- 
mission. Also a coded description of the r.m.s. value of the 
waveform is transmitted. The difference between the pre- 
dicted waveform and the original is used for detecting the 
proportions of voiced and unvoiced components in the 
original speech signal and the larynx frequency. At the 
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Fig. 3- A waveform-prediction vocoder 



decoder signals from a larynx pulse generator and a white 
noise generator are modulated in level and applied to a 
transversal filter similar to that used in the coder. The 
coefficients derived in the coder are used to adjust this 
transversal filter which simulates the effects of the vocal 
tracts. 

The performance of a simulated system (LIP- 
REDER) has shown that fair to good quality speech can 
be achieved at between 3-5 and 7 kb/s and that such a 
system would be fairly tolerant to transmission errors. 
This technique is, however, relatively new and still being 
developed. 

5.2.3. Channel vocoders 

Channel vocoders have been in existence for some 
1 9 
time and commercial devices are now available. In 

operation they differ from the others mentioned above in 

that they are simpler and the analyser is more readily 

realised. 



Fig. 4 shows a schematic for a channel vocoder, in 
the analyser the signal is analysed to measure its amplitude 
in n (typically 15 to 18) frequency bands. Also the degree 
of larynx activity and fricative content is measured in a 
voiced/unvoiced detector, and the larynx pitch is deter- 
mined. Signals describing these amplitudes and parameters 
are digitally coded to form a transmitted digit stream. At 
the receiving terminal the incoming digit stream is de- 
multiplexed, and each signal decoded to drive the syn- 
thesiser. Here signals from a larynx pulse generator and a 
white noise generator are applied to a parallel array of band- 
pass filters (similar to those in the analyser} where the 
amplitude in each band is controlled. 

Various types of channel vocoder are available com- 
mercially, giving a transmitted bit-rate of either 1-2 or 2-4 
kb/s. Their cost is between £5,000 and £10,000, 

The performances of two 2-4 kb/s channel vocoders 
were assessed in the course of this investigation. This was 
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done by obtaining recordings of the reconstructed speech 
output when a series of recorded speech items were applied 
to the input. The quality of the processed speech from 
both devices was good by telephone standards but it was 
difficult to relate the impairments to those normally present 
in telephone circuits. Both examples of channel vocoder 
produced sounds which were unnatural for much of the 
time, but were clearly intelligible. One of the vocoders 
produced severe pitch errors for part of the time, making 
each recorded speaker sound emotionally disturbed. In 
the other the pitch was also not well defined, but here the 
impairment was a metallic timbre. This latter impairment 
was almost certainly caused, not by pitch errors, but by the 
band-by-band analysing and synthesis of the vocal tracts. 
This is one of the fundamental limitations of the channel 
vocoder. The natural response of the vocal tracts is peaked, 

20 21 

with a fairly high Q, at the formant frequencies. ' This 
response cannot be accurately synthesised by varying the 
amplitudes of relatively wide fixed-frequency bands. 

Overall, the channel vocoder is, as yet, the only kind 
of vocoder which is commercially available. It also gives 
a transmitted bit-rate sufficiently low to be transmitted 
through dial-up circuits. The quality achieved by some 
implementations could be adequate for some applications. 
Hence it might be worthwhile further investigating this 
possibility for making contributions for broadcasting via 
the public telephone network. 

5.3. Waveform coding devices 

5.3.1. Pulse code modulation 

With linear pulse code modulation (p. cm.) the 
signal is sampled at a rate a little over twice its highest 
frequency and then digitally coded; the number of bits per 
sample used to code the signal determines the signal-to- 
quantising noise ratio for the system. For a 15 kHz high- 
quality sound channel 416 kb/s (32 kHz sample rate; 13 
bits per sample) 22 must be transmitted. For an adequate 
quality and a 4 kHz bandwidth in a speech contribution 
circuit, using linear p.c.m.; about 80 kb/s (8 kHz sample 
rate; 10 bits per sample) must be transmitted. 

With companding* it is possible to reduce the bit-rate 
required to be transmitted while retaining the same overall 
signal-to-noise ratio but introducing some programme- 
modulated noise or distortion. A form of digital com- 
panding, known as 'A law' instantaneous companding, has 
been recommended forcoding telephone speech signals ' 
in a digital network. Here the sample rate is 8 kHz and 8 
bits per sample are used to code the signal; the bit-rate is 
then 64 kb/s. The A law of companding was especially 
chosen for telephone speech because the ratio between the 
noise introduced by coding (quantising noise) and the 
signal is almost constant for more than 40 dB change of 
input signal level. Hence it can easily accommodate the 
very wide range of signal levels of telephone speech. For 
our application the signal level can be accurately controlled 



* A system based on the compandor principle; that is, the use of a 
compressor at the sending end and a complementary expander at 
the receiving end. 



at the sending terminal, and therefore other laws could be 
used to give either a lower transmitted bit-rate or a channel 
bandwidth greater than 4 kHz at the same bit-rate (64 kb/s). 

R 

Another form of companding, known as near- 
instantaneous companding, has been found particularly 
effective in reducing the bit-rate required for transmitting 
high-quality sound signals. The use of this technique was 
therefore examined for application to lower quality speech 
which has been accurately controlled in level. It was found 
by informal subjective testing that adequate performance 
could be achieved for a 40 Hz to 4 kHz audio band at a 
transmitted bit-rate of about 43 kb/s. Hence, using this 
technique it might be possible to derive a 6 kHz audio 
bandwidth circuit at a 64 kb/s digit transmission rate. In 
these tests it was found that the limit of intelligibility using 
a near-instantaneous companding technique for a 3 kHz 
audio bandiwdth was reached when the digit transmission 
rate was 9-6 kb/s. The quality at this very low bit-rate was 
so poor that it could not be recommended for the applica- 
tion being considered here. 

A low bit-rate 9-6 kb/s speech coding system based on 
a form of adaptive p.c.m. has been developed by Wilkinson 
of SRDE. 26 Here an audio bandwidth of 2-4 kHz is 
achieved and two bits per sample are used to code the signal. 
The significance of the two digits transmitted is adapted 
continuously and, at the receiving terminal, is deduced from 
the transmitted digit stream; it is not transmitted separately 
to the receiving terminal. The advantage of this system is 
that it is simple and cheap to instrument; however, the 
quality achieved was about the same as that achieved with 
the near-instantaneous companding at 9-6 kb/s, and was not 
sufficiently high for it to be considered for the present 
application. 

5.3.Z Delta modulation 

In Reference 27 it was suggested that delta modula- 
tion might offer some advantages over p.c.m. for coding 
speech signals at low bit-rates. It is simpler to instrument 
than p.c.m. and, at some bit-rates, was found to give up to 
3 dB improvement in signal-to-noise ratio when compared 
to p.c.m. 

A simple delta modulation coder and decoder is shown 
in Fig. 5. The audio signal to be coded is applied to one 
input of a comparator whose output is clocked through a 
flip-flop to provide the coded digit stream. To provide the 
other input to the comparator the digit stream is decoded 
using a simple integrator. At the receiving terminal the 
digit stream is applied to an integrator to give the decoded 
audio output signal. 

This very simple system is, in effect, a one-bit differen- 
tial p.c.m. system which codes the rate of change of the 
signal. As such its main disadvantage is that, instead of 
giving a fixed dynamic range independent of frequency, 
signal amplitudes are limited by the maximum rate of 
change of the signal that can be accommodated (slope 
overload). In practical systems it is normal to use variants 
of this technique which overcome this to some extent. 
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Fig. 5- A delta-modulation coder and decoder 



Various forms of adaptive and companded delta 
modulation have been investigated. 28,29 ' 30,31 The 

claimed performances are comparable to those achieved 
with instantaneously companded p. cm. for bit-rates 
between 20 and 40 kb/s. A system similar to that described 
in Reference 31 was constructed some years ago* and from 
recordings made at that time it has been possible to assess 
the quality achieved with this type of system. The system 
had an upper frequency limit of 2-5 kHz and its perfor- 
mance has been recorded at transmission bit-rates of 19-2 
kb/s, 38 kb/s and 56 kb/s. At each of these bit-rates the 
performance was significantly poorer than with the near- 
instantaneously companded p.c.m. system at the same bit- 
rate. At 56 kb/s its performance was similar to that 
achieved with the companded p.c.m. system at a bit-rate of 
28 kb/s. Therefore, although this system is simpler to 
instrument than the companded p.c.m. system, its perfor- 
mance is not sufficiently high for it to be considered for 
application to broadcast quality speech contributions. 

5.3.3. Waveform-prediction coders 



from the original leaving a smaller signal which therefore 
requires fewer bits per sample. The techniques used to 
generate the prediction use a digital transversal filter with 
variable coefficients, as for waveform-prediction vocoders. 

A schematic of a waveform-prediction coder, similar 
to one devised by Dunn, is shown in Fig. 6. In this case 
the prediction coefficients are coded and transmitted to the 
decoder. In another system adaptive filters, with two 
variable coefficients, are used and the coefficients are not 
transmitted but are deduced from the signal at the decoder. 

The transmitted speech quality achieved with these 
devices is fair at 7 to 10 kb/s but, as with waveform- 
prediction vocoders, devices of this type are relatively new 
and are still being developed. 

5.3.4. Other forms of waveform coding devices 

Other forms of waveform coding devices rely to 
some extent on fundamental characteristics of speech. 



In a waveform prediction coder those changes in the 
waveform that can be accurately predicted are subtracted 

* Unpublished work by Dr. C.J, Dalton. 



An interesting device was recently investigated by 
de Mori and Serra where speech was segmented according 
to the pitch period. Signals within a pitch interval were 
coded by describing the amplitudes and positions of the 
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maxima and minima of the waveform. This information 
was then transmitted in digits! form together with a code 
indicating how often a waveform similar to that occurred in 
successive pitch periods. Pauses in speech were indicated 
by coding their durations. 

In tests with this system it was found that fair quality 
of speech was achieved at an average transmission rate of 
4 kb/s.* However, this technique is more suited to the 
computer storage of speech as, although it produces a low 
average bit-rate, the transmission rate might have to be 
averaged over a minute or more of speech. 

The technique for exploiting short pauses in speech 
described in Reference 33 was further investigated as it 
might be of general application to reduce the transmitted 
bit-rate for any speech coding system. This investigation is 
described fully in an Appendix {Section 9) and showed that 
the technique was not practicable when applied to news 
contributions as the storage required would be too great 
and the benefits were small. 

Other waveform coding devices which were investi- 
gated can be only briefly reported here. Vocom is 
similar in principle to a channel vocoder but unable to dis- 
tinguish between voiced and unvoiced signals. At a trans- 
mitted bit-rate of 1-2 kb/s the speech quality achieved was 
inferior to that from a channel vocoder. Hadamard coding, 
applied to speech, has been reported to give a similar saving 
in bit-rate to that with companded p.c.m. but is more 
complex. 36 A device using a split band approach with 
separate limiting in each band gave a similar performance 
to companded p.c.m. at 9-6 kb/s. 

5.4. Summary of speech coding devices 

Earlier in this report it was shown that the trans- 
mission bit-rates that might be convenient for making 
speech contributions to programmes were 1-2 kb/s, 9-6 kb/s 
and 64. kb/s. 

Using currently available devices, it seems that only 
channel vocoders could be used at 1-2 kb/s and 9-6 kb/s, 
and that the quality of these would be poor by broadcast 
standards. At 64 kb/s it seems that good quality speech of 
up to 6 kHz bandwidth could be achieved using a waveform 
coding device based on a form of near-instantaneous com- 
panded p.c.m. 

In the foreseeable future it seems that some form of 
vocoder would have to be used to derive a transmission bit- 
rate of 1-2 kb/s from speech. These are complex devices, 
so this could be a costly alternative to using normal handsets 
and analogue transmission. At 9-6 kb/s it is possible that a 
simpler device using some techniques similar to both vocoder 
and waveform coding devices, might be available. At 64 
kb/s it seems likely that future developments of some of the 
systems outlined in this report might be used to derive a 
good quality wideband speech channel for contributions to 
programmes. 

* These tests were carried out at the Polytechnic, Galileo Ferraris in 
Turin by courtesey of Mr. R. de Mori. 



6. Conclusions 

In this report the various means by which it might be 
possible to send broadcast-quality speech contributions over 
telephone circuits, using digital coding and transmission, 
have been reviewed. As yet no system of coding and trans- 
mission is readily available for providing speech circuits in 
this way, giving a speech quality sufficiently high for it to 
offer advantage over the quality normally achieved using an 
ordinary telephone handset. 

In future, developments might enable either higher 
quality speech to be obtained at a lower bit-rate or higher 
bit-rate circuits to be more widely available than they are at 
present Either of these developments could enable 

improved telephone speech quality for broadcasting to be 
achieved. 



7. Recommendations 

It is recommended that research effort be directed 
towards low-bit-rate speech coding. The most rewarding 
area to be investigated would be a 64 kb/s coding system 
capable to giving high-quality wideband speech communi- 
cation. 

It would also be valuable to investigate or at least to 
follow relevant developments in lower bit-rate coding 
systems working at rates around 10 kb/s. These might 
find application for use with privately rented circuits from 
the Post Office. Moreover, techniques developed for use at 
this bit-rate might enable improvements to be made in 
systems working at higher bit-rates. 
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9. Appendix 
Possible Exploitation of Short Pauses in Speech 



To investigate whether it might be possible to exploit 
the short pauses in continuous speech to enable a reduction 
to be made in the average digit transmission rate for coded 
speech, twelve recordings of speech contributions were 
obtained. Four of these, which were typical, were analysed 
in detail using a level recorder working over a 50 dB range 
with a writing speech of 2000 mm/s. This technique 
enabled pauses lasting longer than 10 ms to be resolved. 

For each recording, the durations and distributions of 
the pauses were listed and processed for a range of pause 
thresholds from -30 dB to -45 dB with respect to peak 



40 r 



30 



20 



10 



(a) 




20 40 

time interval, s 

excerpt 1 



programme level. By averaging the durations of the pauses 
over different time intervals, some indication of the 
temporal distribution of the pauses was obtained. 

The results for the minimum percentage of the time 
that could be classified as a pause, plotted as a function of 
the time interval, i.e. the duration of the extract analysed, 
are shown in Fig. 7 for a variety of threshold values. These 
results show that when the threshold is 30 dB below peak 
programme level about 20%-30% of the programme could 
be classified as pauses if they are averaged over an 
interval of 1 min. For lower threshold levels or shorter 
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Fig. 7 - The minimum percentage of the time classified as a pause as a function of the time intervals 

over which the speech was averaged 
{a) pause threshold —30 dB w.r.t. peak programme level (b) pause threshold —35 dB w.r.t. peak programme level 

(c) pause threshold— 40 dB w.r.t. peak programme level {d\ pause threshold —45 dB w.r.t. peak programme level 
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intervals of percentage time of the pauses is considerably 
less. 

To assess the subjective impairment caused by using a 
threshold detector to control interruption of the speech 
signal in pauses, a simple noise gate was constructed with a 
time constant of about 1 ms. Setting the threshold of this 
gate to —30 dB with respect to peak programme level the 
speech was segmented and severely impaired. To enable all 
four programme items to be transmitted without impair- 
ment a setting of —40 dB was necessary. With this setting 
of the noise gate threshold, it can be seen from Fig. 7 that. 



at best, 10% of the programme could be classified as a 
pause if these pauses were averaged over an interval of 
about 1 min. 

From this investigation it is concluded that no worth- 
while reduction in bit-rate could beachieved by exploiting 
the short pauses that occur in speech. At best a 10% 
reduction in transmitted bit-r3te could be achieved and to 
do this the transmitted bit-rate would have to be averaged 
over 30 sees at least To achieve this averaging several 
seconds of storage would be required and the resulting delay 
in speech transmission might not be tolerated. 
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